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Abstract:  The  basic  architecture  of  the  relational  associative  processor 
( RAP )  i s  summarized.  RAP  is  designed  to  support  relational  data  bases  and 
is  currently  being  implemented  at  the  University  of  Toronto.  In  this  paper, 
three  important  aspects  of  the  architecture  are  discussed.  These  include 
concurrency  in  query  processing,  virtual  extension  of  the  RAP  processor 
memory,  and  a  mechanism  for  bulk  information  output.  The  archi tectural 
features  are  presented  and  analytical  evaluations  are  provided.  It  is  demon¬ 
strated  that  the  architectural  optimizations  presented  here  are  essential 
for  the  efficient  on-line  operation  of  data  base  processors  similar  to  RAP. 
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1.  INTRODUCTION 

There  has  been  a  considerable  amount  of  research  in  the  area 
cf  computer  architectures  for  non-numeric  processing.  Several 
systems  have  been  proposed  and  some  cf  these  have  been 
implemented.  It  is  only  recently,  however,  that  research 
involving  this  class  of  machines  has  focused  on  meeting  the 
demands  of  modern  Data  Ease  Management  Systems  (DEMS) .  RAF  is  a 
processor  designed  specifically  for  the  needs  of  DBMS. 


RAP 

is  an  associative 

processor  composed 

of  an 

arra  y 

cells . 

Each  cell 

uses  a 

circulating 

serial 

memory 

an  d 

microprocessor  that 

support 

s  the  operations  of 

DEMS. 

RAP 

designed 

to  avoid  the 

dif f icu 

lties  of  using 

conventional 

soft  w 

and  hardware  to  implement  data  base  management  systems.  The 
present  version  of  RAP  is  used  as  a  backend  processor  to  a 
conventional  computer,  which  provides  a  user  interface,  compiles 
RAP  programs,  and  monitors  RAP  operation. 

The  RAP  architecture  includes  several  features  for  the 
specific  purpose  cf  performance  enhancement.  These  features 
include  concurrency  of  reguest  processing,  virtual  extension  of 
processor  memory,  and  bulk  information  output.  All  three  of 
these  have  the  common  purpose  of  increasing  the  performance  and 
capacity  of  PAP.  The  cost  of  these  features  in  terms  of 
additional  hardware  required  is  very  low  relative  to  the  factors 
by  which  they  increase  the  productivity  of  RAP. 

The  basic  architecture  of  RAP  is  outlined  below,  then  the 
optimization  features  are  presented  individually.  In  each 
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presentation , 

the 

feature 

is  described. 

then 

is  evaluated 

and 

justified.  For 

each 

feature. 

the  analysis 

indicates  ways 

in 

which  RAP  can 

and 

should 

be  tuned  to 

meet 

the  demands 

of  a 

specific  information  system  environment. 

1.1  Basic  organization  of  the  RAP  processor 

Although  RAP  is  thoroughly  documented  elsewhere  [1,2, 3,4],  we 
also  describe  it  briefly  here.  Figure  1.1  shows  an  overview  of 
the  RAP  architecture.  The  present  version  of  RAP  is  designed  as 
a  peripheral  processor,  although  its  structure  lends  itself  to  a 
modular  extension  towards  becoming  an  autonomous  processor.  RAP 
communicates  with  its  support  processor  to  receive  its  data  base 
contents  and  compiled  programs,  and  to  return  the  results  of 
users*  requests. 

RAP  is  composed  of  a  controller,  a  set  function  unit  for 
statistical  computations  on  sets  of  values,  and  an  array  of  cells 
that  operate  in  parallel.  A  cell  consists  of  a  memory  component 
and  a  logic  component.  The  memory  component  can  be  one  track  of 
a  rotating  magnetic  memory  device  such  as  a  disk  or  drum,  or  some 
form  of  circulating  sequential  memory  of  the  magnetic  bubble  or 
charge  coupled  (semiconductor)  type.  The  logic  component  is  a 
microprocessor,  which  operates  on  the  memory  as  a  "search 
machine",  directs  data  manipulation,  and  performs  some  of  the 
limited  numeric  computations  required  in  data  base  processing. 
The  set  function  unit  is  used  to  combine  the  set  function  results 
computed  in  the  cells  to  obtain  a  value  for  the  total  memory 
contents.  The  controller  is  responsible  for  overall 
coordination.  It  sends  control  sequences  to  the  cells  to 


Figure  1.1  Overview  of  RAP  architecture 
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initiate  instruction  execution,  controls  the  set  function  unit, 
and  executes  directly  some  of  the  RAP  primitives.  It  also 
controls  data  transfers  between  RAP  and  the  support  processor. 


1.2  Cell  organizaton 

Each  cell  consists  of  a  rotating  (or  circulating)  memory,  a 
buffer,  an  information  search  and  manipulation  unit  (ISMU) ,  and 
an  arithmetic  logic  unit  (ALU) .  The  ISMU  and  ALU  are  the  two 
major  blocks  of  the  cell  microprocessor.  The  basic  logic  blocks 
are  displayed  in  figure  1.2.  Each  cell  lies  on  a  serial 
c cmmunica tion  path  and  receives  status  signals  from  its 
predecessor  and  sends  signals  to  its  successor.  During  certain 
RAP  operations  (i.e.,  mappings  between  relations)  cells  send  data 
tc  each  ether.  The  cell  also  has  connections  for  input/output 
operations  and  for  signals  that  are  passed  between  the  controller 
and  set  function  unit. 

1.2.1  Rotating  (circulating)  memory 

If  the  cell  memory  is  a  track  of  a  rotating  disk  or  drum, 
data  is  read  and  written  via  fixed  heads — one  pair  for  each 
cell--while  the  memory  rotates  under  these  heads.  It  takes  one 
revolution  of  the  memory  for  its  contents  to  be  read  from  one  end 
to  the  other.  This  revolution  time  is  called  the  "rotation 
time".  If  the  cell  memory  is  a  circulating  sequential  memory, 
the  memory  operations  are  the  same  except  that  read/write  heads 
are  replaced  by  other  electronic  or  magnetic  mechanisms  and  the 
rotation  time  is  called  the  "circulation  time". 


With  current 


datatol/Obus 

CELLi-1  and  cells 


CELLi+1 


Figure  1.2  Overview  of  cell  architecture 
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integrated  circuit  technology,  the  cell  logic  can  handle  bit 
densities  as  high  as  one  bit  every  100  nanoseconds. 

1.2.2  Euffer 

As  the  memory  rotates  under  the  heads,  it  is  read,  circulated 
through  logic,  and  written  back  after  a  time  delay.  This  delay 
is  directly  proportional  to  the  length  of  a  buffer  placed  between 
the  read  and  write  heads.  While  in  the  buffer,  data  may  be 
operated  on  by  cell  logic.  This  buffer  must  be  sufficiently  long 
to  store  most  large  tuples.  Approximately  1024  bits  is  expected 
to  be  sufficient  in  most  applications.  However,  this  chosen 
length  should  not  be  taken  as  a  restriction  since  the  modular 
design  of  the  system  permits  insertion  of  longer  buffers. 

1.2.3  Data  structure  and  storage  format 

The  physical  data  structure  chosen  for  RAP  is  referred  to  as 
a  BAP  relation.  A  RAP  relation  is  Codd's  normalized  relation  [5] 
except  that  duplicate  tuples  are  allowed  and  several  special 
domains  called  "mark  bits"  are  added.  Figure  1.3  shows  a  RAP 
relation.  There  is  a  restriction  on  the  degree  (i.e.,  number  of 
domains)  of  a  PAP  relation.  It  cannot  be  higher  than  a  number, 
pmax,  which  depends  on  the  sizes  of  the  domains  of  the  relation. 
The  number  pmax  can  be  increased  by  using  a  longer  buffer.  Due 
to  the  variation  in  sizes  of  PAP  domains,  the  range  for  a  RAP 
relation's  maximum  degree  is  constrained  by  28  <  pmax  <  98  for  a 
buffer  length  of  1024  bits. 
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Figure  1.3  A  RAP  relation  and  a  RAP  tuple 
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Due  to  the  linear  nature  of  a  memory  track,  a  direct  mapping 
of  the  data  structure  requires  that  it  be  linearized.  The 
relaticnal  structure  lends  itself  to  a  linear  form  easily  as  can 
be  seen  in  the  storage  format  shown  in  figure  1.4.  The  tuples  of 
a  relation,  which  are  themselves  linear  representations  of  domain 
values,  can  be  stored  one  after  another  on  a  track.  This 
structure  is  similar  to  a  file  on  a  conventional  disk.  The  first 
two  data  blocks  in  each  cell  contain  relation  and  domain  names 
respectively  and  act  as  "header"  blocks.  Each  succeeding  block 
ccntains  a  tuple  (row)  of  the  relation.  The  stored  blocks  are 
separated  by  gaps.  The  positions  of  names  in  the  domain  names 
blcck  determines  the  positions  of  the  values  in  the  following 
tuple  blocks. 

The  items  in  tuples  are  of  two  types:  numeric  and  non¬ 
numeric.  Items  can  have  cne  of  three  discrete  lengths.  In  the 
prototype  being  implemented,  lengths  of  8,  16  and  32  bits  have 
been  chosen. 

If  a  relation  has  too  many  tuples  to  be  stored  in  one  cell, 
then  several  cells  are  used.  The  hardware  design  requires  that 
the  relation  and  domain  names  be  repeated  on  each  cell  of  a 
relation  and  no  cell  may  hold  tuples  from  more  than  one  relation. 
Cells  devoted  to  one  relation  are  not  required  to  be  physically 
contiguous  since  the  relation  held  by  each  track  is  identified 
explicitly.  In  general,  cells  of  several  relations  are 
physically  intermixed. 

As  shewn  in  figure  1.3  there  are  five  to  nine  reserved  bit 
positions  at  the  beginning  of  each  tuple.  The  first  bit  is  the 
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Figure  1.4  Storage  format 
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delete  flag  (DF) .  If  this  bit  is  on,  it  indicates  that  tuple  has 
been  deleted  and  garbage  collection  hardware  may  "erase"  the 
tuple.  The  remaining  bits  are  the  mark  domains  mentioned 
earlier.  They  resemble  a  user  work  area  or  can  be  thought  of  as 
selectors  allowing  the  results  of  cne  RAP  instruction  to  be  used 
by  another.  This  greatly  extends  the  associative  capability  of 
BAP.  If  a  tuple  is  T-marked,  T  being  a  given  combination  of  mark 
tits,  the  specified  mark  bits  are  turned  on.  Likewise,  a  tuple 
is  said  to  be  T-unmarked  if  the  T  subset  of  bits  are  all  turned 
off.  If  some  of  the  T  subset  of  bits  are  on  and  some  are  off, 
then  the  tuple  is  neither  T-marked  nor  T-unmarked.  There  are 
several  instructions  for  marking  tuples  and  using  the  markings  as 
extra  data  values  for  selecting  sets  of  tuples.  Complex  queries 
on  RAP  may  lead  to  long  sequences  of  RAP  instructions,  the 
intermediate  results  of  which  are  recorded  in  the  mark  domains  of 
tuples . 

The  functions  of  storage  allocation  and  garbage  collection 
are  accomplished  directly  in  the  hardware.  Each  cell  packs  the 
tuples  towards  the  beginning  of  the  cell  by  "short  circuiting" 
deleted  tuples  as  they  pass  through  the  buffer.  This  accumulates 
the  unused  space  at  the  end  of  the  cell.  New  tuples  are  first 
inserted  in  the  unused  space  of  cells  dedicated  to  the  relation, 
and,  if  more  space  is  required,  a  new  cell  is  initialized  and 
used  tc  stcre  the  rest  of  the  tuples.  Garbage  collection  takes 
place  automatically  in  the  background  whenever  the  cell  would 


otherwise  be  idle 
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1.2.4  Information  search  and  manipulation  unit  (ISMU) 

The  information  search  and  manipulation  unit  is  responsible 
for  inter-cell  communication,  responding  to  the  commands  sent 
from  the  controller,  evaluation  of  data  search  criteria,  I/O  data 
transfers,  and  control  of  the  ALU  for  data  mcdif ications. 

1.2.5  Arithmetic  logic  unit  (ALU) 

The  arithmetic  logic  unit  contains  a  serial  adder, 
multiplier,  and  control  logic  for  carrying  out  an  arithmetic  or 
replacement  update.  Logic  for  intermediate  set  function 
calculations  cf  sum,  count,  maximum,  and  minimum  are  also 
included. 

1.3  Set  function  unit  (SFU) 

The  set  function  unit  provides  the  logic  to  calculate  overall 
results  of  the  set  functions  count,  sum,  maximum,  minimum,  and 
average  over  the  entire  RAP  memory.  Intermediate  results  are 
stored  in  individual  ALU  registers  cf  the  cells.  Arithmetic  is 
accomplished  through  a  serial  adder  and  a  divider  which  work  on 
the  intermediate  results  as  they  are  collected  from  the  cells. 
The  SFU  includes  the  logic  for  polling  the  cells  and  transmitting 
the  results  to  the  controller. 

1 . 4  Controller 

The  controller  is  responsible  for  the  overall  coordination  of 
the  cell  micro-processors.  It  loads  the  search  criteria  units  of 
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ISMU's,  energizes  opcode  and  mode  lines,  senses  the  end  of  an 
operation,  and  repeats  the  cycle  for  the  next  EAP  instructor. 
During  control  sequences  of  the  controller,  instruction  operands 
and  data  are  retrieved  from  controller  storage  and  bussed  onto 
cell  lines  in  data  block  gap  intervals.  The  controller  also 
directly  executes  certain  BAP  primitives  such  as  decision  and 
transfer  commands.  Other  operations,  such  as  providing  data 
transfers  between  BAP  and  its  support  processor  and  doing  error 
checking,  which  are  common  tasks  in  most  I/O  controllers,  are 
also  included  among  the  functions  of  the  BAP  controller. 

1.5  Instructon  set 


The  B£P  system  has  a  machine-oriented  yet  high  level  and 
complete  instruction  set  for  querying  and  manipulating  its  data 
base.  These  instructions  are  provided  directly  by  hardware.  The 
instruction  set  can  be  divided  into  the  following  major  data  base 
operation  command  types: 

a)  retrieval, 

b)  update, 

c)  set  function, 

d)  insertion  and  deletion, 

e)  data  base  creation  and  destruction,  and 

f)  decision  and  transfer. 

Each  command  is  formulated  as  an  "assembler"  language 
instruction  for  the  BAP  machine.  One  assembler  instructon  is 
translated  into  a  single  BAP  machine  instruction  which  initiates 
the  hard-wired  control  in  the  cell  logic  circuits.  The  data  base 
contents  are  manipulated  directly  on  their  storage  by  the  cell 
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micro-processors.  The  search  criterion  is  transmitted  to  all  of 
the  cells  simultaneously.  It  is  evaluated  as  the  memory  contents 
are  circulated  through  the  cell  logic.  Values  from  selected 
tuples  are  immediately  manipulated  or  read  out.  A  query  can  be 
made  up  of  one  or  more  RAP  instructions.  Most  instructions  are 
executed  within  one  rotation  of  the  entire  RAP  memory  contents. 
Since  RAP  has  a  parallel  organization  and  its  memory  is 
associative,  the  response  time  is  usually  independent  of  data 
base  size  or  contents.  However,  response  time  does  depend  on 
guery  complexity. 

2.  THE  RAP  CONCURRENCY  FACILITY 

2.1  Description  of  the  RAP  concurrency  facility 

In  almost  all  environments,  there  is  likely  to  be  a  high 
variance  in  the  processing  times  required  for  various  requests 
(queries)  to  RAP.  Many  short  reguests  will  require  a  few  RAP 
revolutions,  while  some  complex  requests  may  require  several 
hundred  revolutions.  It  is  undesirable  to  keep  many  short 
requests  waiting  during  the  execution  of  a  very  long  reguest. 
Thus  it  is  natural  to  introduce  priority  scheduling  so  that 
requests  that  are  expected  to  be  short  are  initiated  ahead  of 
those  that  are  expected  to  be  long .  The  change  from  first-ccme, 
first-served  initiation  of  requests  to  shortest-expected-length- 
first  initiation  is  trivally  accomplished  by  a  modification  of 
the  scheduling  algorithm  in  the  support  processor  software.  No 
change  to  RAP  itself  is  required. 
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However,  a  further  step  toward  providing  quick  response  to 
short  requests  is  possible.  The  arrival  of  a  short  request  might 
be  made  to  cause  the  interruption  (preemption)  of  a  long  request 
in  progress.  In  this  case,  the  request  that  is  preempted  can  be 
handled  in  either  of  two  ways.  It  may  be  restarted  at  a  later 
time,  or  it  may  be  resumed  at  a  later  time.  In  the  former  case, 
all  effort  invested  in  a  request  before  it  is  preempted  is  lost. 
This  results  in  a  reduction  of  RAP’s  total  throughput  capacity 
since  seme  time  is  wasted.  In  the  latter  case,  it  is  possible  to 
avoid  the  waste  of  effort,  but  only  by  adding  additional  hardware 
to  RAP  for  the  purpose  of  recording  the  state  of  progress  of  any 
preempted  request. 

Consider  establishing  exactly  two  classes  of  reguests: 
class-1  includes  all  update  requests  and  those  retrievals  that 
require  only  a  few  revolutions,  while  class-2  includes  only 
retrievals  that  require  many  revolutions.  With  just  two  classes, 
the  additional  hardware  required  for  preemptive-resume  scheduling 
is  limited  to  a  duplication  of  the  marking  lcgic  hardware.  A 
redundant  set  of  mark  bits  in  each  tuple  plus  a  modest  amount  of 
logic  in  each  cell  to  associate  one  set  of  mark  bits  with  class-1 
requests  and  the  other  set  of  mark  bits  with  a  class-2  request 
(in  progress  or  suspended  by  preemption)  is  required.  Figure  2.1 
illustrates  the  separation  of  the  request  stream  into  class-1 
(non- preemptible)  and  class-2  (preemptible)  requests  in  the 
support  processor.  A  RAP  tuple  with  the  two  sets  of  mark  bits 
can  be  seen  in  figure  1.3. 
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With  two  sets  of  marking  logic  provided  in  the  hardware,  only 
one  reguest  can  be  suspended  at  a  time.  Thus  a  preempted  class-2 
request  returns  to  the  front  of  the  class-2  queue.  If  more 
replications  of  the  marking  logic  were  provided,  more  priority 
levels  could  be  accommodated. 

In  summary,  the  RAF  concurrency  facility  uses  a  two  class 
preemptive-resume  scheduling  policy  in  order  to  keep  short 
requests  from  being  delayed  substantially  by  long  requests.  The 
fact  that  preempted  requests  are  resumed,  not  restarted, 
guarantees  acceptable  progress  for  the  longer  requests. 

The  choice  of  a  preemptive-resume  scheduling  discipline  for 
R£F  despite  the  associated  cost  of  additional  hardware  is 
justified  by  the  comparisons  below.  Consider  four  candidate 
scheduling  disciplines: 

a)  FCFS  (no  priority)  , 

b)  non-preemptive  priority, 

c)  preemptive-resume,  and 

d)  preemptive-restart. 

All  disciplines  except  preemptive-resume  can  be  implemented 
without  requiring  special  hardware  in  RAP.  However,  we  will  show 
that  for  a  wide  range  of  parameter  distributions,  the  use  of  the 
preemptive-resume  discipline  causes  mean  flow  times  (expected 
time  in  system)  for  class-1  reguests  to  be  substantially  less 
than  those  under  FCFS  or  non-preemptive  priority.  Also,  mean 
flew  times  for  class-2  requests  will  be  much  less  under 
preemptive-resume  than  under  preemptive-restart. 
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2.2  Request  characteristics 


2.2.1  Request  arrivals 
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Class-1  queue  Class-2  queue 
Figure  2.2  Job  arrival  rates  with  respect  to  queues 

Fiqure  2.2  indicates  the  significant  parameters  associated 
with  the  process  of  request  arrivals.  Since  requests  will  be 
generated  by  a  reasonably  large  population  of  independent  users, 
we  assume  that  requests  arrive  in  a  Poisson  stream.  The 
following  parameters  should  be  known: 

L  =  Poisson  arrival  rate  of  requests, 

U  =  proportion  of  update  requests, 

SR  =  proportion  of  retrievals  that  are  "short", 

SD  =  proportion  of  updates  that  are  "short". 
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From  these  three  parameters,  we  may  derive  : 

LI  =  1*0  ♦  L *SF* ( 1  -  U)  =  arrival  rate  of  class-1  requests 
L2  =  I*  ( 1  -  SB) *  ( 1  -  0)  ^arrival  rate  of  class-2  requests 

2.2.2  Request  processing  times 

An  analysis  presented  in  [2]  derives  estimates  of  request 
processing  time  distributions  by  considering  expected  query 
characteristics  and  the  resulting  sequences  of  RAP  instructions. 
The  processing  times  of  short  requests  in  typical  on-line  (non¬ 
hatch)  environments  are  estimated  to  have  a  mean  processing  time 
(u)  of  eight  RAP  revolutions.  Similarly,  the  processing  time 
distribution  cf  long  jobs  is  estimated  to  have  a  mean  (v)  of  100 
RAP  revolutions. 

Class^J  requests 

Class-1  consists  of  short  retrievals  and  all  updates  (long  or 
shcrt) .  If  exponential  processing  time  distributions  are  assumed 
for  both  long  and  short  requests,  then  the  overall  class-1 
request  processing  time  distribution  is  h yperexpor.ential : 

f  (x)  =  a  (1/u)  {exp  (-x/u)  }  +  (1  -  a)  (1/v)  {exp  (- x/v)  } 

where 

u  =  mean  processing  time  fcr  short  requests, 
v  =  mean  processing  time  fcr  long  requests,  and 
a  =  proportion  of  short  reguests  within  all  class-1  jobs 
=  {S0*U  ♦  SR*  ( 1  -  0)}/{U  *■  SR*  ( 1  -  U)  }  . 
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Class- 2  reguests 

The  distribution  of  processing  times  for  class-2  may  have 
lever  variance  than  that  for  class-1.  Thus  we  will  consider  both 
exponential  and  Erlang-2  forms. 

The  mean  processing  times  for  classes  1  and  2  respectively 
will  then  be: 

m 1  =  a*u  +  (1  -  a) *v 

m2  =  v. 

Total  system  workload  is  given  by 
W  -  Ll*m1  ♦  L2*m2 . 

Note  that  when  W>1,  reguests  arrive  at  a  faster  rate  than 
they  are  completed  so  that  at  least  class-2  requests  must  have  a 
nen-finite  expected  time  in  system  for  any  scheduling  discipline. 

2.3  Ccmparison  of  scheduling  disciplines 

Appendix  A  displays  expressions  for  e>  acted  time  in  system 
(mean  flow  time)  fer  each  request  class  under  various  scheduling 
disciplines.  The  expressions  are  derived  from  the  general 
formulae  given  elsewhere  [6,7].  Curves  of  times  in  system  as  a 
function  of  W,  the  total  system  lead,  are  plotted,  as  the  total 
arrival  rate,  L,  and  the  processing  time  distributions  are 
varied.  Figure  2.3  shows  expected  time  in  system  for  class-1 
requests  under  each  of  the  four  disciplines.  Time  in  system  is 
expressed  in  EAP  revolutions.  There  are  two  sets  of  curves  in 
the  figure  with  class-2  requests  having  exponential  and  Erlang-2 
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Figure  2.3  Expected  time  in  system  for  class-1  jobs 
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distributions,  respectively.  Class-1  requests  have  a 

hyper exponential  distribution  since  short  requests  are  mixed  with 
lcnq  updates.  The  followinq  parameter  values  are  used  in  the 
plots: 

U  =  .4  (proportion  of  updates) 

SE  =  .5  (proportion  of  retrievals  that  are  short) 

SO  =  .825  (proportion  of  updates  that  are  short) 
u  =  8  (averaqe  lenqth  of  short  requests) 
v  =  100  (averaqe  lenqth  of  lonq  requests) 

From  these  values,  we  can  calculate: 

a  =  0.9  (proportion  of  short  requests  in  class-1) 
ml  =  17.2  revolutions  (class-1  mean) 
m2  =  100  revolutions  (class-2  mean) 

These  parameter  settinqs  are  made  to  reflect  an  environment 
reasonably  balanced  between  retrievals  and  updates,  with  a  number 
of  lonq  retrievals,  and  a  hiqh  portion  of  short  requests  [2]. 

Fiqures  2.4  and  2.5  show  respectively  the  expected  time  in 
system  for  class-2  jobs  and  the  expected  overall  time  in  system. 
The  curves  are  plotted  with  the  same  parameter  values  used  in 
fiqure  2.3.  (Eecause  of  the  tiqhtness  of  space,  only  the  more 
critical  Frlang-2  distribution  is  considered  for  class-2) . 

The  same  experiment  was  repeated  varying  first  the  proportion 
of  lcr.g  retrievals  and  then  the  difference  between  the  mean 
processing  times  of  the  job  classes.  The  relative  performance 
among  the  scheduling  disciplines  considered  did  not  change. 
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Figure  2.4  Expected  time  in  system  for  class-2  jobs 
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Figure  2.5  Expected  overall  time  in  system 
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2.4  Observations 

From  figure  2.3,  it  can  be  seen  that  expected  time  in  system 
fcr  class-1  requests  is  considerably  lower  under  the  preemptive 
scheduling  disciplines  than  under  FCFS  or  even  non-preemptive 
priority.  Figure  2.4  shows  that  class-2  reguests  do  better  under 
the  non-preemptive  scheduling  disciplines.  However,  preemptive- 
resume  scheduling  hurts  them  very  little,  while  preemptive- 
restart  increases  their  expected  time  in  system  substantially  and 
also  causes  the  system  to  become  overloaded  (expected  time  in 
system  becomes  infinite)  with  a  much  lower  total  workload. 
Figure  2.5  shows  overall  expected  time  in  system.  Non-preemptive 
priority  and  preemptive-resume  are  clearly  better  than  the  other 
disciplines,  particularly  at  heavy  workloads.  Preemptive-resume 
is  significantly  better  than  non-preemptive  priority  at  medium 
workloads.  In  many  environments,  rapid  response  to  class-1 
reguests  is  more  important  than  rapid  response  to  class-2 
requests.  When  this  is  true,  the  degree  of  preference  for 
preemptive-resume  scheduling  lies  somewhere  between  that 
indicated  in  figure  2.3  and  in  figure  2.5  since  we  wish  to  give 
stronger  consideration  to  class-1  requests. 

The  preemptive-resume  discipline  requires  an  increase  in 
hardware  cost  of  about  10??  for  providing  extra  mark  bits  and 
logic.  The  large  performance  advantage  of  preemptive-resume 
scheduling  that  was  observed  above  justifies  the  selection  of  the 
preemptive-resume  discipline  in  a  system  where  the  cost  of  delay 
in  responding  to  requests  is  significant.  The  selection  of  the 
preemptive-resume  discipline  remains  valid  even  if  some  small 
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pcrticn  of  class-2  jobs  are  run  on  a  restart  basis.  This  can  be 
necessitated  when  a  long  retrieval  is  interrupted  by  an  update 
that  modifies  a  domain  of  a  particular  relation  on  which  the 
selection  criterion  of  the  retrieval  depends. 

The  differences  between  the  class-2  times  in  system  for  the 
two  preemptive  disciplines  are  even  greater  with  the  exponential 
distribution.  Therefore,  regardless  of  the  form  of  the 
processing  time  distributions,  it  appears  that  the  improvement 
that  is  achieved  with  the  RAF  concurrency  facility  is  sufficient 
tc  justify  its  cost. 

3.  THE  RAF  VIRTUAL  MEMORY  SYSTEM 

3.1  RAP  memory  capacity  relative  to  data  base  size 

RAP  is  expected  to  have  a  reasonably  large  memory  capacity  of 
10®  to  109  bits  of  encoded  data.  However,  whatever  the  actual 
RAP  memory  size  may  be,  there  will  always  be  cases  where  the  RAP 
memory  capacity  is  inadequate  for  a  particular  data  base.  There 
are  three  situations  we  wish  to  consider  here: 

a)  RAP  capacity  suffices  tc  held  the  entire  data  base. 

b)  Each  individual  relation  can  be  held  in  RAP,  but  not 
all  relations  simultaneously. 

c)  One  or  more  individual  relations  of  the  data  base  are 
each  larger  than  RAP  capacity. 
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For  the  first  situation,  a  virtual  memory  system  is  not 
necessary.  A  minimum  system  configuration  consisting  of  a  RAP 
connected  by  a  channel  to  its  support  processor  allows  the  data 
base  to  be  RAP-resident. 

In  the  second  situation,  it  is  necessary  to  load  into  RAP 
memory  only  those  relations  required  for  responding  to  a 
particular  query.  Clearly,  there  are  many  questions  of  strategy 
in  attempting  to  minimize  the  movement  of  relations  to  and  from 
R£P.  He  shall  address  these  in  detail  in  the  next  section. 

Consider,  finally,  the  third  situation.  Any  relation  that  is 
larger  than  RAP's  capacity  can  be  partitioned  into  smaller  pieces 
that  each  fit  into  RAP.  Each  guery  program  that  deals  with  such 
a  relation  must  then  be  modified  (by  software  in  the  support 
processor)  to  deal  with  each  partition  of  the  relation 
individually.  The  sequence  of  RAP  subprograms  that  each  deal 
with  one  partition  must  be  followed  by  a  final  subprogram  that 
combines  the  results  of  its  predecessors. 

For  a  multi-relational  guery,  such  as  a  mapping  between  two 
relations  neither  of  which  fits  into  RAP,  the  following  procedure 
is  required.  One  relation,  say  RA,  is  partitioned  into  na  parts 
and  the  other,  say  RB,  into  nb  parts  so  that  one  partition  of  RA 
and  one  partition  of  RE  fit  simultaneously  in  RAP  memory.  Then 
several  RAP  subprograms,  each  of  which  does  a  mapping  between  one 
partition  of  RA  and  one  partition  of  RB,  can  be  generated  so  that 
the  entire  reguest  is  accomplished  in  na*nb  loadings  of  the  RAP 
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3.2  The  RAP  virtual  memory  system  description 

The  virtual  memory  facility  provided  in  RAP  makes  RAP  useful 
for  very  large  data  base  applications  where  the  data  base  size 
exceeds  RAP's  memory  capacity.  Figure  3.1  shows  an  overall 
system  configuration  including  peripherals  and  special  hardware 
for  a  RAP  system  that  includes  the  virtual  memory  facility.  In 
this  configuration,  the  data  base  resides  in  the  bulk  memory  (any 
conventional  secondary  storage  device)  attached  to  the  support 
processor  and  only  the  collection  of  relations  required  by  some 
number  of  queries  in  the  RAP  processor's  queue  are  contained  in 
the  RAP  memory.  The  RAP  memory  consists  of  a  pair  of  tracks 
assigned  to  each  cell.  As  indicated  in  the  lower  portion  of  the 
figure  3.1,  at  any  given  time,  one  track  of  each  pair  is 
connected  to  the  cell  logic  while  the  other  acts  as  a  buffer 
connected  to  a  conventional  I/O  controller.  In  other  words, 
logically,  the  set  of  tracks  can  be  viewed  as  two  memory  devices, 
one  being  the  primary  RAP  memory  and  the  other  being  the  buffer. 
Track  connections  are  reversed  as  needed,  that  is,  whenever 
buffer  memory  contents  are  required  by  RAP  for  processing,  or 
when  updated  contents  are  to  be  copied  back  to  the  bulk  store  via 
the  conventional  I/O  device.  Track  pairs  interchange  roles 
independently  so  that  each  device  may  hold  some  buffers  and  some 
active  cells  at  any  given  time.  The  RAP  memory  is  connected  to 
the  support  processor  by  a  dedicated  channel.  The  buffer  is 
connected  to  a  conventional  I/O  controller  which  is  in  turn 
attached  to  a  separate  channel.  The  channel  provides  a  link 
between  the  buffer  and  the  bulk  memory  through  the  support 
processor.  This  configuration  makes  it  possible  to  perform 
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Figure  3.1  RAP  virtual  memory  system  configuration 
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input-output  on  both  BAP  and  the  buffer  device  at  the  same  time. 
It  permits  the  loading  and  unloading  of  relations  to  and  from  the 
buffer  while  queries  are  processed  in  the  BAP  memory.  The 


loading  and  unloading 

of 

the  buffer 

can  be 

accomplished  by 

standard  channel  programs 

of 

the  support 

processor 

Figure  3.2  depicts 

the 

structure 

of  the 

track  pair  in  a 

single  cell.  In  order 

to 

swap  track 

pairs. 

both  the  BAP 

processing  and  the  buffer  I/O  operations  must  first  be  completed. 
Then  a  special  signal  is  produced  in  the  BAP  cells  which  are  to 
be  swapped.  The  actual  swapping  of  tracks  is  affected  by  a 
SWITCH  command  issued  by  the  conventional  I/O  controller  under 
the  direction  of  the  monitor. 

3.3  The  BAP  virtual  memory  system  operation 

The  BAP  virtual  memory  system  moves  cell  memory  size  portions 
of  relations  to  and  from  BAP  memory.  This  is  appropriate  since  a 
cell  is  the  smallest  processable  piece  of  a  relation  on  BAP  and 
it  can  also  be  made  a  unit  of  I/O  transfer  with  adequate  buffer 
space  in  the  support  processor.  The  movement  of  cell  memory  size 
portions  of  the  data  base  contents  will  be  referred  to  as 
lcading/unloading  hereafter. 

The  strategy  for  selecting  portions  of  relations  to  be  loaded 
into  BAP  is  critical  to  the  operation  of  the  virtual  memory 
system.  This  strategy,  as  well  as  the  operation  of  the  monitor 
of  the  virtual  memory  system,  is  discussed  below. 

3.3.1  The  Selection  strategy 
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The  main  goal  of  the  virtual  memory  is  to  overlap  the  loading 
and  unloading  of  BAP  with  BAP  processing  so  that  loading  and 
unloading  causes  as  few  delays  as  possible.  As  explained 
earlier,  reguests  in  BAP  are  processed  using  two  queues— one  for 
ncn-preemptible  (fast)  requests  and  the  other  for  preemptible 
(slow)  requests.  Kith  this  scheduling  discipline,  one  reasonable 
selection  strategy  is  the  following.  The  relations  required  by 
each  request  in  the  class-1  queue  are  known.  The  requests  in 
this  queue  are  reordered  to  reduce  the  amount  of  loading  and 
unloading  of  BAP  memory  required  between  requests.  This  request 
order  is  then  partitioned  into  subsets  in  such  a  way  that  all 
relations  reguired  by  a  subset  can  reside  simultaneously  in  BAP 
memory.  This  approach  tends  to  reduce  the  number  of  times  a 
particular  relation  must  be  moved  to  or  from  BAP  memory. 

Having  formed  the  subsets,  the  overlapping  of  BAP  processing 
with  loading  and  unloading  is  achieved  in  the  following  manner. 
Eefore  giving  control  of  the  BAP  processor  to  a  request  subset, 
the  monitor  in  the  support  processor  looks  ahead  and  initiates 
operations  to  load  the  relations  required  by  the  following  subset 
into  the  buffer  tracks.  If  any  buffer  track  contains  updated 
tuples,  it  is  unloaded  to  the  bulk  store  before  it  is 
overwritten.  After  initiating  the  loading,  the  monitor  starts 
the  processing  of  the  current  request  subset  on  BAP.  In  this  way 
processing  is  overlapped  with  the  unloading  and  loading  of  BAP 
memory.  The  system  cannot  be  delayed  by  loading  and  unloading  as 
long  as  the  total  processing  time  of  the  subset  being  processed 
is  longer  than  the  time  to  unload  updated  relations  and  to  load 
the  relations  of  the  next  subset.  Thus,  it  is  desirable  that  the 
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number  of  requests  per  subset  be  as  large  as  possible. 
Therefore,  the  extent  of  performance  gain  using  the  proposed 
virtual  memory  facility  depends  on  the  extent  to  which  it  is 
possible  tc  group  together  sets  of  requests  that  require  the  same 
relations.  Assuming  that  there  is  a  backlog  of  requests  waiting 
tc  be  processed,  the  avoidance  of  delays  due  to  loading  and 
unloading  will  result  in  both  lower  average  time  in  system  and 
higher  throughput  for  BAP. 

As  far  as  the  queue  of  class-2  reguests  is  concerned,  a 
different  procedure  is  necessary.  Since  a  reguest  is  likely  to 
be  preempted  several  times  before  completion  it  would  not  be  wise 
tc  employ  look  ahead  buffering  for  the  class-2  queue.  Whatever 
space  in  RAP  memory  is  not  required  to  held  relations  for  the 
current  subset  of  class-1  requests  should  be  devoted  to  retaining 
relations  for  the  first  request  in  the  class-2  queue.  However, 
if  this  is  not  possible,  the  relations  of  the  preempted  class-2 
reguest  would  be  reloaded. 

The  choice  of  which  cells  to  load  into  is  not  difficult  since 
full  knowledge  is  available  about  what  relations  are  required  by 
each  request.  The  priority  with  which  portions  of  relations  keep 
their  places  in  BAP  memory  is  proportional  to  the  proximity  to 
the  front  of  the  class-1  queue  of  the  first  request  requiring 
that  relation. 

3.3.2  The  monitor 

As  seen  in  the  configuration  of  figure  3.1,  a  monitor, 
provided  by  software  in  the  support  processor,  is  needed  to 
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coordinate  the  job  queues,  RAF,  and  the  buffers.  The  information 
required  by  the  monitor  for  carrying  out  its  functions  includes 
the  following.  For  each  request,  the  class  must  be  known  and 
each  relation  required  must  be  identified.  Additionally,  we  must 
knew  whether  the  request  is  currently  in  service  and  whether  or 
not  it  has  been  preempted  but  not  completed.  For  each  relation 
it  must  be  known  in  which  cells  of  RAP  memory  it  currently 
resides  and  where  its  remaining  portions  are  in  bulk  memory. 
Finally,  for  each  cell  of  RAP  memory,  it  must  be  known  whether 
the  current  contents  have  been  updated  since  being  loaded. 

Seme  commands  and  procedures  of  the  monitor  as  currently 
inplemented  are  outlined  below.  The  monitor  commands  initiate 
RAP  processing  and  buffer  loading  and  unloading  operations. 
PAGE-SET-DEL  is  the  last  instruction  of  each  job  subset  and 
indicates  that  relations  just  processed  for  the  current  request 
subset  may  be  paged  out.  The  SWITCH  command  interchanges  the 
rcles  of  the  pairs  of  memory  tracks  for  selected  cells.  The  LOAD 
command  initiates  input/output  operations  to  load  relations 
required  by  the  next  request  subset  from  the  bulk  memory  into  the 
buffer.  The  UNLOAD  command  writes  cells  of  updated  relations 
back  onto  the  bulk  memory.  The  commands  SWITCH,  LOAD,  and  UNLOAD 
are  sent  to  RAP  before  the  processing  of  a  subset  of  requests 
commences.  If  several  relations  have  to  be  unloaded,  switched 
and  loaded,  then  several  SWITCH,  UNLOAD,  and  LOAD  commands  must 
be  issued  in  sequence. 

The  monitor  procedures  control  buffer  loading  and  unloading 
operations,  analyze  relation  requirements  of  job  subsets,  and 
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ccntrcl  BAP  processing.  Procedures  called  POST,  POST-PRE,  and 
TEST  manipulate  and  test  input-output  status  indicators 
associated  with  each  cell.  These  status  indicators,  with  the  aid 
of  the  TEST  procedure,  are  used  to  assure  successful  loading  of 
reguired  relations  before  reguest  processing  is  initiated.  The 
procedure  PAGE-ROUTINE  determines  what  relations  reguired  for  the 
following  reguest  subset  are  not  currently  in  RAP  memory,  and 
thus  must  be  loaded  before  reguest  processing  can  be  initiated. 
Finally*  the  procedures  INITIATE-RAP  and  INITIATE-RAP-RESUME 
cause  a  seguence  of  RAP  machine  instructions  and  associated  data 
tc  be  passed  to  the  RAP  controller  so  that  RAP  processing  can  be 
initiated.  The  pseudo  code  in  table  1  shows  hew  the  monitor  must 
insert  commands  into  the  stream  of  reguest  subsets  in  erder  to 
cause  loading  and  unloading  to  be  overlapped  with  processing. 
The  word  "page"  in  the  code  below  refers  to  cell  memory  size 
portions  of  relations.  Whenever  request  processing  ends,  a 
driver  procedure  is  entered  with  a  flag,  called  CLASS-FLAG, 
indicating  the  class  of  the  terminated  reguest.  The  driver  then 
passes  control  to  the  corresponding  procedure,  which  takes 
necessary  steps  for  continuation  of  the  operation  of  the  system. 
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TAELE  -  1-  Monitor  control  structure 


driver: 

case 


end 

end 


statement  on  CLASS-FLAG 
FLAG  1: 

call  nonpreemptible- jobs-classl 
FLAG  2: 

call  preemptible- jobs-class2 


nonpreemptible- jobs-classl : 

call  TEST-WAIT  (are  pages  ready?) 

SWITCH  RN  (get  them  into  primary  memory) 

if  UPtATEL  (any  page-outs  for  copying?) 

then  do 


ONLOAD  RN 
call  POST 

end 


call  INITIATE-RAP 
call  PAGE-ROOTINE 
call  TEST-WAIT 
LOAD  RN* 
call  POST 


(initiate  process) 

(examine  the  following  subset) 
(is  unload  finished?) 

(page  in  the  next  relation  RN1) 


end 


preemptible- jobs-class2 : 
call  TEST-WAIT 
if  PREEMPTED 

then  do 

if  t  (PAGED-OOT) 
then  do 

call  INITIATE-RAP- RE SO ME 


return 

end 

end 

else 

FLAG-PROG  =  1 
call  PAGE-ROOTINE 
LOAD  FN1 
call  FCST-PREM 
SWITCH  RN 
if  OP DAT ED 
then  do 

ONLOAD  RN 
call  POST 

end 

if  FLAG-PRCG  =  1 

then  do 

call  INITIATE-RAP- RESUME 


return 

FLAG-PROG  =  0 
end 
else 

call  INITIATE- RAP ’•RESUME 

end 
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3.4  Performance  improvement  with  the  RAP  virtual  memory  system 

A  simulation  model  of  this  proposed  virtual  memory 

configuration  has  been  used  to  assess  the  value  of  the  virtual 
memory  facility  [4].  The  simulation  model  incorporated  a  single 
RAP  with  the  buffer  memory  for  providing  the  virtual  memory 

system,  and  a  conventional  bulk  storage  device,  both  controlled 
by  the  support  processor.  In  all  the  simulation  experiments,  the 
workload  was  assumed  tc  consist  of  a  high  proportion  of  short 
requests  with  nearly  as  many  updates  as  retrievals.  The  RAP 
contained  200  cell  memory  pairs,  and  each  cell  had  a  capacity  of 
500,000  bits.  The  parameters  that  were  varied  in  the  simulation 
study  were: 

1)  data  base  size  (1  billion  bits  and  2.5  billion  bits) 

2)  relation  sizes  (2.5  million  bits  to  12.5  million  bits) 

3)  average  processing  time  of  short  requests  (4  tc  32  RAP 

revolutions) 

4)  locality  (high,  medium  and  low) 

The  processing  time  distribution  for  short  requests  was 
exponential  and  for  long  requests  was  Erlang-2  (with  a  mean  of 
104  revolutions).  On  average,  long  requests  involved  three 
relations,  while  short  requests  involved  1.5  relations.  Varying 
degrees  of  locality  were  achieved  by  letting  the  probabilities 
with  which  relations  are  selected  for  use  in  a  request  be 
anywhere  from  uniform  over  all  relations  tc  highly  biased  in 
favor  of  a  few  relations. 

The  effects  of  the  above  parameters  on  the  expected  time-in- 
system  of  class- 1  requests  was  observed.  It  was  found  that 
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system  performance  was  improved  with  the  virtual  memory  facility 
whenever  even  moderate  locality  of  relation  reference  existed 
(that  is,  whenever  some  of  the  relations  had  a  higher  probability 
of  being  referenced  than  others) .  Also,  system  performance 
tended  to  be  better  when  relations  were  small,  and  when  request 
processing  times  were  long  relative  to  loading/unloading  times 
[4]. 


4.  RAF  OUTPUT  MECHANISM 


4.1  Description  of  the  BAP  read-out  scheme 


The  RAP 

read-out 

scheme 

is 

based 

on  maximizing 

the 

utilization  of 

the  serial 

data 

path 

f  rom 

the  cells  to 

the 

controller.  A  RAP  read  instruction  specifies  certain  domains  of 
a  specified  relation  that  are  to  be  read  for  each  tuple  that 
satisfies  a  given  qualification  condition.  Thus  anywhere  from 
zero  tuples  to  all  tuples  in  a  relation  may  qualify  to  have 
certain  domains  read  out.  If  the  qualification  involves 
conditions  on  only  mark  domains,  then  the  read-cut  starts 
immediately.  Otherwise,  a  marking  revolution  takes  place  first 
and  is  followed  by  reading  revolutions.  In  BAP,  since  selection 
is  associative,  all  qualified  tuples  of  a  relation  are  located 
simultaneously.  Highly  parallel  read-out  mechanisms  that  permit 
all  eligible  tuples  to  be  read  out  at  once  are  prohibitive  in 
cost.  Therefore,  the  BAP  output  scheme  is  based  on  maximizing 
the  data  transmission  rate  of  the  standard  bit  serial  output 
channel.  It  reads  one  qualified  tuple  (if  there. is  one  on  any 
cell)  for  each  tuple  position.  Extra  revolutions  occur  only  when 
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twc  or  more  cells  contain  eligible  tuples  at  the  same  tuple 
position.  The  BAP  read-out  scheme  uses  an  output  polling 
mechanism  which  multiplexes  information  from  the  cells  into  the 
serial  output  channel.  Figure  4. 1  shows  the  overall 
configuration  of  the  BAP  read-out  scheme. 

f 

As  can  be  seen,  the  polling  device  can  have  multiple  levels. 
In  the  BAP  design,  a  new  level  would  be  added  when  the  number  of 
cells  per  group  exceeds  64  [2].  With  multi-level  polling  the 
time  required  to  locate  an  eligible  tuple  is  reduced  so  that 
polling  can  be  completed  in  the  inter-tuple-position  gap.  With 
this  configuration  it  is  easy  tc  convert  the  system  to  a 
partially  parallel  output  scheme  by  removing  the  outer  polling 
level  (s)  and  connecting  each  group  cf  cells  tc  a  separate  output 
channel. 

Normally,  in  any  group  of  cells,  not  all  of  the  cells  are 
eligible.  Cells  are  ineligible  when  they  do  net  hold  tuples  of 
the  relation  being  read  or  when  the  qualification  condition  is 
net  satisfied  by  the  tuple  at  the  current  tuple  position.  At 
each  tuple  position,  the  first  cell  (if  any)  that  holds  a  tuple 
qualifying  for  read-out  is  identified  and  the  tuple  is  read-out. 

The  outer  polling,  called  the  channel  polling,  works  in  the 
same  way  as  a  cell  group  polling  described  above.  In  order  to 
understand  the  analogy  between  the  two  levels,  a  cell  group 
should  be  thought  of  as  a  tuple  position  (which  can  be  eligible 
cr  ineligible)  on  a  cell.  Then  the  channel  polling  is  similar  to 
cell  group  polling  with  each  cell  groug  appearing  to  be  a  single 
cell.  At  a  given  tuple  position  there  can  be  several  eligible 
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Figure  4.1  RAP  read-out  scheme 
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tuples  waiting  to  be  read,  but  the  channel  polling  mechanism 
chooses  only  one  of  them.  The  cell  group  polling  arbitrates 
among  the  cells  of  a  group  and  the  channel  polling  arbitrates 
among  cell  groups. 

4.2  Evaluation  of  the  BAP  read-out  scheme 

In  order  to  evaluate  the  RAP  read-out  mechanism,  we  use  the 
following  model  of  eligible  tuples*  Let  n  be  the  number  of  cells 
in  a  relation  and  p  be  the  number  of  tuple  positions  in  each 
cell.  Then  we  can  imagine  the  n*p  tuples  of  the  relation  forming 
an  n  by  p  array,  each  row  representing  a  cell  and  each  column 
representing  a  tuple  position. 

The  most  obvious  approach  to  reading  out  eligible  tuples  is 
to  take,  in  turn,  each  cell  that  hold  any  eligible  tuples  and 
read  off  all  eligible  tuples  on  that  cell.  With  this  method, 
read-out  takes  one  revolution  for  each  cell  containing  an 
eligible  tuple.  Since  a  RAP  cell  typically  holds  about  1000 
tuples  [2],  this  approach  will  cause  read-out  to  take  n 
revolutions  even  if  only  about  1  $  of  the  tuples  are  eligible. 

Rather  than  passing  over  all  of  each  cell,  it  is  obviously 
necessary  only  to  scan  until  no  eligible  tuple  remains  on  the 
cell.  When  the  expected  number  of  eligible  tuples  per  track  is 
k,  then  approximately  k/(k+1)  is  the  proportion  of  each  cell  that 
must  be  scanned.  Thus,  at  most  a  factor  of  two  in  read-out  time 
is  saved  over  the  previous  case,  and  that  occurs  only  when  there 
are  very  few  tuples  eligible. 
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The  EAP  read-out  mechanism,  as  described  in  the  previous 
section,  causes  one  tuple  to  be  transferred  in  every  tuple 
position  in  which  any  eligible  tuple  remains.  Thus,  the  total 
number  of  revolutions  required  is  equal  to  the  maximum  number  of 
eligible  tuples  that  happen  to  fall  at  any  single  tuple  position. 
This  number  is  also  referred  to  as  the  maximum  collision  length. 

We  will  determine  the  maximum  collision  length  in  each  of  two 
situations.  In  the  first,  we  assume  that  each  tuple 
independently  has  probability  k/(n*p)  of  being  eligible.  In  the 
second,  we  assume  that  exactly  k  of  the  n*p  tuples  are  eligible. 

In  the  first  case,  it  is  straightforward  to  calculate  the 
expected  maximum  collision  length.  Let  x  =  k/(n*p).  Th®n 

Prob  [j  eligible  in  specified  tuple  position]  = 


Prob  [ j  or  fewer  eligible  in  a  specified  tuple  position]  = 

i(?)x;o-xr 

I  sO  '  / 

Prob  [  j  or  fewer  in  all  p  tuple  positions  ]  = 


[£o(;)x'(l-x)n''JP=  P(j."-p) 


Prob  [ j  is  the  maximum  collision  length]- 


Therefore,  the  expected  maximum  collision  length  is 

n 


E[ max  collision 


length  ]  =  4^-  J  *  P(j, ^,p)  -  = 

vJ 

71-1 

= n  PO.".p)-Z  P(j>,p) 

JSO  J 
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In  the  second  approach,  we  assume  that  we  know  that  exactly  k 
tuples  are  eligible,  rather  than  knowing  only  that  the  expected 
number  of  eligible  tuples  is  k.  This  tends  to  decrease  the 
expected  maximum  collision  length. 


m*p\ 

There  are  l  ^  J  equally  likely  ways  of  picking  the  k  tuples 
that  qualify.  We  must  determine,  for  each  value  of  j  from  1  to 
n,  the  proportion  of  these  ways  that  lead  to  j  being  the  maximum 
number  of  eligible  tuples  at  any  single  tuple  position.  Figure 
4.2  displays  an  algorithm  for  calculating  the  expected  maximum 
collision  length. 


For  each  possible  maximum  collision  length,  the  algorithm 
calculates  the  number  of  configurations  of  eligible  tuples  that 
yield  that  maximum  collision  length.  Each  partition  of  the 
integer  k  that  has  the  maximum  collision  length  as  the  largest 
element,  and  has  exactly  p  elements  (some  of  which  may  be  zero) 
is  considered.  The  elements  of  the  partition  are  the  numbers  of 
eligible  tuples  in  various  tuple  positions.  For  each  partition, 
the  algorithm  counts  the  number  of  distinct  ways  of  mapping  the 
elements  cf  the  partition  to  tuple  positions  and  the  number  of 
ways  of  choosing  the  eligible  tuples  among  the  n  candidates  (one 
from  each  cell)  in  each  position. 
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Given  k,  n,  and  p,  we  wish  to  calculate  the  expected  maximum 
collision  length  at  any  of  the  p  tuple  positions  if  exactly  k  of 
the  n*p  tuples  are  eligible.  Let  si  be  the  number  of  tuple 
positions  at  which  there  are  exactly  i  eligible  tuples. 

For  each  j  from  1  to  n,  calculate  Q(j)  as  follows: 


Initialize  Q(j)  to  0. 

For  each  distinct  vector  of  non-negative  integers, 
S  =  <s  1 , s2, s3 , . . . , s j>  such  that 

j 

1)  J^.i*si  =  k 
1=0 

j 

2)  2si  =  p 

ISO 

and  3)  Sj  >0, 

add  to  Q  ( j)  the  quantity 


i  so 


Calculate  P(j)  =  the  probability  that  the  maximum  collision 
length  is  j 


Calculate  the  expected  maximum  collision  length  as: 


E[ me  1  ] 


IP 


(3)*j 


Figure  4.2  Algorithm  for  calculating  expected  maximum  collision  length. 
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Figure  4.3  is  a  plot  of  expected  maximum  collision  length 
versus  the  percentage  of  tuples  eligible.  From  the  computational 
formulae,  it  is  obviously  difficult  to  do  the  calculations  for 
large  values  of  n  and  p.  However,  by  plotting  the  curve  first 
fcr  small  values,  then  for  larger  and  larger  values,  the  curves 
clearly  converge  to  the  one  shown  in  figure  4.3.  From  that 
curve,  we  see  that  the  expected  maximum  collision  length  grows 
above  cne  only  very  slowly  until  about  3%  of  the  tuples  are 
eligible.  When  3%  to  30%  of  the  tuples  are  eligible,  the 
percentage  of  cells  in  the  maximum  collision  length  stays  well 
below  the  percentage  of  tuples  that  are  eligible.  In  most 
applications  with  large  data  bases,  it  will  be  extremely  unusual 
tc  wish  to  retrieve  even  as  many  as  3%  of  the  tuples. 

The  BAP  read-out  scheme  is  limited  by  having  only  a  single 
channel  over  which  to  transmit  tuples  to  the  support  processor. 
In  certain  specialized  applications,  the  quantity  of  tuples 
retrieved  may  be  sufficiently  large  to  justify  two  or  more 
parallel  channels  to  transmit  tuples  from  RAP  to  the  support 
processor.  This  can  be  done  in  one  of  two  ways:  each  channel 
may  be  dedicated  to  a  subset  of  the  cells,  or,  with  more  expense 
and  complexity  in  control  logic,  each  channel  could  be  connected 
to  each  cell.  In  the  former  case,  each  channel  is  associated 
with  a  cell  group  (see  figure  4.1).  If  the  cells  of  the  relation 
are  approximately  uniformly  distributed  across  cell  groups,  then 
the  expected  number  of  revolutions  reguired  for  read-out  is 
approximately  the  same  as  if  a  single  channel  were  used  on  n/x 
cells  of  x*p  tuples  each,  where  x  is  the  number  of  channels.  In 
the  latter  case,  any  channel  can  transmit  tuples  from  any  cell 
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Figure  4.3  RAP  read-out  time 
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group.  Then  in  each  revolution,  up  to  x  eligible  tuples  are 
transmitted,  so  that  the  expected  number  of  revolutions  is 
reduced  by  approximately  a  factor  of  x  whenever  the  maximum 
collision  length  exceeds  x. 

Although  it  is  clear  that  parallel  channels  can  significantly 
reduce  the  number  of  revolutions  required  for  read-out,  the 
environments  anticipated  for  RAP  are  seldom  expected  to  require 
the  transmission  of  more  than  a  very  few  percent  of  the  tuples  of 
any  relation.  Thus  extra  channels  would  not  yield  sufficiently 
frequent  savings  to  justify  their  associated  complexity  and  cost. 

5.  CONCLUDING  REMARKS 

The  RAP  architecture  is  designed  specifically  for  the  support 
of  data  base  management  systems.  In  this  paper,  we  have 
discussed  three  areas  in  which  the  characteristics  of  specific 
data  base  application  environments  influence  the  design  of 
certain  architectural  features  of  RAP. 

We  anticipate  that  RAP-like  devices  will  prcve  most  useful  in 
data  base  environments  with  common  identifiable  characteristics. 
In  light  of  these  characteristics,  we  have  refined  the  design  of 
RAP  in  three  areas.  Because  we  expect  high  variance  in  the 
service  times  required  by  various  requests  and  because  rapid 
response  tc  shcrt  requests  is  likely  to  be  important,  we  have 
designed  RAP  to  permit  preemptive-resume  scheduling  of  reguests 
in  two  priority  classes.  Because  we  expect  some  data  bases  to 
exceed  RAP*s  storage  capacity,  we  have  designed  BAP  tc  have  a 
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virtual  memory  facility  through  which  portions  of  a  data  base  may 
be  moved  to  and  from  RAP  memory  concurrently  with  processing  of 
ether  portions  of  the  data  base  already  resident  in  RAP.  Eecause 
we  expect  the  channel  connecting  RAP  to  its  support  processor  to 
be  a  critical  resource  of  the  system,  we  have  designed  RAP*s 
read-cut  scheme  to  assure  read-out  of  an  eligible  tuple  in  every 
tuple  position  in  which  an  eligible  tuple  remains  to  be  read  out. 

While  we  have  arrived  at  a  single  design  for  the  prototype 
implementation  of  RAP,  this  study  will  permit  us  to  tailor  later 
versions  of  RAP  to  various  data  base  application  environments. 
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APPENDIX  A 


Notation 

Li  =  Poisson  arrival  rate  of  class  i, 
mi  =  mean  class  i  processing  time, 

ti  =  expected  value  of  the  square  of  processing  time, 

ri  =  lead  of  class  i  =  Li*mi,  and 

E[Ti]  =  expected  time  in  system  for  class  i. 

Expected  times  in  system : 

a)  No  priority — FCFS  scheduling 

Class- 1 : 


L 1  *t  1  +  L2*t2 

E[ TI no-pri  ]  - - - - - - -  +  ml 

2(1  -  ri  -  r2) 

Class-2: 


L 1  *t  1  +  L2*t2 

E[T2no-pri]  - - - - +  m2 

2(1  -  ri  r2) 

b)  Non-preemptive  priority  scheduling 
Class-1 : 


L 1  *t  1  +  L2*t2 

E[  Tlnon-pre  ]  - - - - —  ♦  ml 

2(1  -  ri) 


Class- 2 : 
E[T2ncn-pre]  = 


L  1  *t  1  ♦  L2*t2 
2(1  -  ri)  (1  -  ri  - 

c)  Preemptive-resume  scheduling 


*  m2 


r2) 


L 1  *t  1 

E[  T 1  pre-rsm  ]  - -—  +  ml 

2(1  -  ri) 

L1*t1  +  L2*t2  m2 


2(1-  ri)  (1  -  ri  -  r2) 


E[T2pre-rsm]  = 


(1  -  ri) 
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d)  Preemptive-restart-without-r esampling  scheduling 

L1*t1 

E[  T 1 pre-rst  ]  - ♦  ml 

2(1  -  r  1) 


L1*t1  L2*E[ c22  ] 

E[T2pre-rst]  =  E[c2]  +  — - - + - - - ----- 

2 (1  -  rl) 2  2(1  -  L  2*Ef  c2  ]) 


where 

E[c2]  =  (1/11  ♦  E[b1  ])  (E[exp(11*m2)  ]  -  1) 

E[c22]  =  2(1/11  +  E[  b  1  ])  2E[  (exp  (L 1  *m2)  -  I)2]  ♦ 

(E[  bl 2  ]  +  (2E[b1]/L1)  ♦  2/Ll2)  (E[  exp  (L1*m2)  ]  -  1) 


-  2  (E[ bl  ]  +  1/11) 


(W 


co 

m2[  exp  (11  *m2)  ]f  (x2)dx2] 


where 

f  (x2)  is  the  density  function  of  the  class-2  processing  time 
distribution,  and  bl  is  the  busy  period  initiated  by  one  class-1 
arrival  which  is  given  by: 

m  1 

E[  b  1  ]  - — ,  and 

(1  -  rl) 

tl 

E[  b  1 2  ]  - - 

(1  -  rl) 3 

For  all  the  disciplines  considered,  the  overall  expected  time 
in  system,  E[Tcv],  will  be  expressed  as  the  weighted  combination 
of  class- 1  and  class-2  times,  that  is: 

1 1 *E[ T 1  ]  *  12*E[ T2 ] 

E[  Tov  ]  - - - - - - 

11  +  L2 

The  ti*s  are  calculated  with  respect  to: 
ti  =  Vi  ♦  mi2 

where  Vi  is  variance  for  class  i,  and 

V  =  k*m2  for  Erlang-k 

V  =  a  (2  -  a)u2  -  2a(1  -  a)u*v  +  (1  -  a2)  v2 
for  hyperexponential  distribution. 
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